Per-bucket concurrent rehashing algorithms
نویسنده
چکیده
This paper describes a generic algorithm for concurrent resizing and on-demand per-bucket rehashing for an extensible hash table. In contrast to known lock-based hash table algorithms, the proposed algorithm separates the resizing and rehashing stages so that they neither invalidate existing buckets nor block any concurrent operations. Instead, the rehashing work is deferred and split across subsequent operations with the table. The rehashing operation uses bucket-level synchronization only and therefore allows a race condition between lookup and moving operations running in different threads. Instead of using explicit synchronization, the algorithm detects the race condition and restarts the lookup operation. In comparison with other lock-based algorithms, the proposed algorithm reduces high-level synchronization on the hot path, improving performance, concurrency, and scalability of the table. The response time of the operations is also more predictable. The algorithm is compatible with cache friendly data layouts for buckets and does not depend on any memory reclamation techniques thus potentially achieving additional performance gain with corresponding implementations.
منابع مشابه
Split-Ordered Lists: Lock-Free Extensible Hash Tables (a summary)
The hash table is a common data structure for applications requiring constant time insert, delete and find of certain items. The basic idea is to use an array of buckets, where each bucket stores a list of items. A hash function is used to determine which bucket a certain item belongs to. Finding or deleting a specific item amounts to a linear search in a bucket list pointed to by the hash func...
متن کاملRobust and Efficient Locality Sensitive Hashing for Nearest Neighbor Search in Large Data Sets
Locality sensitive hashing (LSH) has been used extensively as a basis for many data retrieval applications. However, previous approaches, such as random projection and multi-probe hashing, may exhibit high query complexity of up to Θ(n) when the underlying data distribution is highly skewed. This is due to the imbalance in the number of data stored per each bucket, which leads to slow query tim...
متن کاملHashing and Rehashing in Emulated Shared Memory Hashing and Rehashing in Emulated Shared Memory
The PRAM model is widely used to formulate parallel algorithms because of its shared memory and its synchronous behaviour. The model however bears little resemblance to real parallel machines. This has led to various approaches to emulating PRAMs on processor networks. We brieey survey the principles behind these emulations and show why hashing is an important part of them. We discuss the commo...
متن کامل: Parallel Algorithms for Bucket Sorting and the Data Dependent Prefix Problem
The data dependent prefix problem is to compute all the n initial products x1⃝x2⃝...⃝xk, 1 ≤ k ≤ n, where the order is specified by a linked list. A parallel algorithm for the data dependent prefix problem is presented. This algorithm has time complexity O( n p + log n log n p ) using p processors on the exclusive-read exclusive-write computation model. A bucket sorting algorithm is also develo...
متن کاملComparison of Bucket Sort and RADIX Sort
Bucket sort and RADIX sort are two well-known integer sorting algorithms. This paper measures empirically what is the time usage and memory consumption for different kinds of input sequences. The algorithms are compared both from a theoretical standpoint but also on how well they do in six different use cases using randomized sequences of numbers. The measurements provide data on how good they ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1509.02235 شماره
صفحات -
تاریخ انتشار 2011